Building a Large-Scale Japanese CFG for Syntactic Parsing

نویسندگان

  • Tomoya Noro
  • Taiichi Hashimoto
  • Takenobu Tokunaga
  • Hozumi Tanaka
چکیده

Large-scale grammars are a prerequisite for parsing a great variety of sentences, but it is difficult to build such grammars by hand. Yet, it is possible to derive a context-free grammar(CFG) automatically from an existing large-scale, syntactically annotated corpus. While being seemingly a simple task at first sight, CFGs derived in such a fashion have hardly ever been applied to an existing systems. This is probably due to the great number of possible outputs, i.e. parse results(high ambiguity). In this paper, we analyze some causes of this high ambiguity, and we propose a policy for building a large-scale Japanese CFG for syntactic parsing, capable to decrease ambiguity. We end the paper with an experimental evaluation of the obtained CFG.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of a Japanese CFG Derived from a Syntactically Annotated Corpus with Respect to Dependency Measures

Parsing is one of the important processes for natural language processing and, in general, a large-scale CFG is used to parse a wide variety of sentences. For many languages, a CFG is derived from a large-scale syntactically annotated corpus, and many parsing algorithms using CFGs have been proposed. However, we could not apply them to Japanese since a Japanese syntactically annotated corpus ha...

متن کامل

A Large-Scale Japanese CFG Derived from a Syntactically Annotated Corpus and Its Evaluation

Although large-scale grammars are prerequisite for parsing a great variety of sentences, it is difficult to build such grammars by hand. Yet, it is possible to build a context-free grammar (CFG) by deriving it from a syntactically annotated corpus. Many such corpora have been built recently to obtain statistical information concerning corpus-based NLP technologies. For English, it is well known...

متن کامل

Incremental CFG Parsing with Statistical Lexical Dependencies

Incremental parsing with a context free grammar produces partial syntactic structures for an initial fragment on the word-by-word basis. Owing to the syntactic ambiguity, however, too many structures are produced, and therefore its parsing speed becomes very slow. This paper describes a technique for efficient incremental parsing using lexical information. The probability concerning dependencie...

متن کامل

Treebank-Based Acquisition of LFG Parsing Resources for French

Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in automatically obtained wide-coverage grammars from treebanks for natural language processing. In particular, recent years have seen the growth in interest in automatically obtained deep resources that can represent information absent from simple CFG-type structured treeban...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004